Exploiting Morphological Regularities in Distributional Word Representations

نویسندگان

Arihant Gupta

Syed Sarfaraz Akhtar

Avijit Vajpayee

Arjit Srivastava

Madan Gopal Jhanwar

Manish Shrivastava

چکیده

We present an unsupervised, language agnostic approach for exploiting morphological regularities present in high dimensional vector spaces. We propose a novel method for generating embeddings of words from their morphological variants using morphological transformation operators. We evaluate this approach on MSR word analogy test set (Mikolov et al., 2013d) with an accuracy of 85% which is 12% higher than the previous best known system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Orthogonality of Syntax and Semantics within Distributional Spaces

A recent distributional approach to wordanalogy problems (Mikolov et al., 2013b) exploits interesting regularities in the structure of the space of representations. Investigating further, we find that performance on this task can be related to orthogonality within the space. Explicitly designing such structure into a neural network model results in representations that decompose into orthogonal...

متن کامل

Quantificational features in distributional word representations

Do distributional word representations encode the linguistic regularities that theories of meaning argue they should encode? We address this question in the case of the logical properties (monotonicity, force) of quantificational words such as everything (in the object domain) and always (in the time domain). Using the vector offset approach to solving word analogies, we find that the skip-gram...

متن کامل

Linguistic Regularities in Sparse and Explicit Word Representations

Recent work has shown that neuralembedded word representations capture many relational similarities, which can be recovered by means of vector arithmetic in the embedded space. We show that Mikolov et al.’s method of first adding and subtracting word vectors, and then searching for a word similar to the result, is equivalent to searching for a word that maximizes a linear combination of three p...

متن کامل

Diverse Context for Learning Word Representations

Word representations are mathematical objects that capture a word’s meaning and its grammatical properties in a way that can be read and understood by computers. Word representations map words into equivalence classes such that words that share similar properties to each other are part of the same equivalence class. Word representations are either constructed manually by humans (in the form of ...

متن کامل

Morphological Smoothing and Extrapolation of Word Embeddings

Languages with rich inflectional morphology exhibit lexical data sparsity, since the word used to express a given concept will vary with the syntactic context. For instance, each count noun in Czech has 12 forms (where English uses only singular and plural). Even in large corpora, we are unlikely to observe all inflections of a given lemma. This reduces the vocabulary coverage of methods that i...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Exploiting Morphological Regularities in Distributional Word Representations

نویسندگان

چکیده

منابع مشابه

Orthogonality of Syntax and Semantics within Distributional Spaces

Quantificational features in distributional word representations

Linguistic Regularities in Sparse and Explicit Word Representations

Diverse Context for Learning Word Representations

Morphological Smoothing and Extrapolation of Word Embeddings

عنوان ژورنال:

اشتراک گذاری